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Push Model XML Parser / Node Factory 

Development; Chris Lovett 
Program Manager: Charles Fraukston 
Last Update; 1998.06-18 

Version: 1.21 (IE5 Beta 1 vcr&ion is considered 1-0) 
Location: hitp:/7xTn^ 

Special thanks to;Gary Burd, Jobann Posch, and Scott Cottrille 

Beta 1 

We don't recommend writing new code to use the NodeFactory interfaces exposed in Beta 1, 
because they'll be changing. However, if you must get started, the old spec is still at: 
htxp: //xffllweb/msx ir^^ 

New features/changes for EES Beta 2 milestone: 

We are trying to make it simpler to create node factories in this release. Toward this end we're 
making a few changes; 

Imporant update: not all of these features are on the schedule for the IE5 Beta 2 
milestone, but we're leaving them in the spec. Eventually, we'll want to do it 
all. Features that are not likely to be implemented for IES are greyed out. The schema 
processing, validation support, and entity expansion will not be in IE5* However, the flag 
values are reserved, and CreateNode will have a reserved parameter that we eventually 
intend to use the pass the schema argument down* 

1 . Some customers want to take advantage of our Schema (DTD) processing and 
namespace support Currently, our DTD & namespace processing are too intimately tied 
to our own IDOMNode factory. It is not possible for a customer NodeFactory to ask for 
namespace and/or schema processing - either you use our IDOMNode interface, or you 
do all your own namespace, schema processing, and entity expansion. This revised spec 
documents two parser flags XMLFLAG^PROCESSNAMESPACES and 
XMLFLAG.PROCESSSCHEMAS, which allow a customer node factory to request 
namespace and i^hema processing- 

2. We have separated the dwType parameter on IXMLFactory "CreateNode into a dwType 
and a subType. This makes it easier to uniformly process (or ignore) entire classes of 
cags, 

3. There is a new parser Hag: XMLFL A G_EXP AN DENTITIES . If this flag is off, the 
behavior is as it was for Beta 1 - the customer NodeFactory receives CreateNode calls 
on ENTITYREF objects (Ic. for each "&foo;"). If ifiis flag is seu the parser expands all 
entity references for The customer node factory. This makes it easier io write the 
common case where the customer NodeFactory application need not be aware of what 
content came from entities as opposed io in-line text. 

4. The DCMLParser interface is broken out into INodeSource, and DCMLParser. The idea is 
to allow an application to act as a NodeSource and drive an INodeFactory without having 
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io implement a full KMLParscr interface. 



Introduction - should you use the Node Factory? 



It is called the Push Model parser because the client has to keep pushing it 10 get the job done. 
This way the client can parse the XML asynchronously without using multiple threads or 
fibers. 

These are low level interfaces that are designed for C programmers only and will not be 
scriptablc. First there is an DtMLParser interface which is used to parse XML either from an 
IStream or directly from an m-memory buffer- Then there is a Node Factory callback interface 
that the parser calls for each XML tag or attribute it finds in the source. The client implements 
their own Node Factory. 

Alternatives 

The Node Factory is not the only way for applications to deal with XML. The IDOMNode 
interface (see XMlJDOMhtoO maintains an in-memory tree representation of an XML 
document, and provides methods for navigating, querying (see .7XQL Control Obfcpt 
Modekdoe) . and modifying the tree. This OM is generally simpler for most applications. Jn 
general, if your application does not need the XML processor to remember the contents of the 
XML document - either because you have your own representation, or you are processing the 
document and discarding it F then it is more efficient to use the NodeFactoiy 
approach. Otherwise, the IDOMNode will be easier to use. 

Information about XML in general is available from htt p7ywww.micrDsoft.co m/standards/xmy 
and information about our XML plans, IE5 XML, etc is available from http://xmlweb. 
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Parser 

The Parse* takes XML input in a variety of ways (i.e. via a stream, via a URL to a document, or 
via text pushed to it), parses the XML and sends parse events to a NodeFactory. The parser is 
divided into two interfaces: an INodeSource, which defines the parse events and other 
information (such as position information for parse errors) that are sent to a NodeFactory. The 
DCMLParser interface which inherits from INodeSource and adds methods to define the XML 
source (stream, URL, or pushed text), set and push NodeFactories, and deal with security and 
state reporting issues. 



IXMLParser 


(.Load, Kun, 
Set URL, 
Pu^i Factory, 

etc.) 


INodeSource 

(Suspend, Reset, 
GetLinePosition, 
etcO 







Parse 



Events 



IN< 

(Cre 



interface XHodaSoujreei XUsluunni 

{ 

HREStrtiT Suspend ( ) ; 

HRBStTLT Reset f) 

ULONG GetLinettumber { ) ; 

ULONG GotLinePositionO ; 

utQNG GetAbsolutePositionl) ; 

HRESULT GetLineBuf fex < 

[OUC] COJftfit WCHAR** ppWCBuf, 
[out] ULONG* pulfcen) ; 

HRESULT GetLastErrar () ; 

ULONG GetError Posit ion () ; 

HRESULT GeCErrorReasonC 

tout] BSTR* pbstrReason) ; 

HRESULT SetPlagst 

[in] ULONG iFlags) ; 



LTLONG GecPlagsH 
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infrf*** »Mt*«rD« • INodeflource 

{ HRESULT SetUBL{ 

[in] cons* WCHAR* pszBaseUrl, 
[in] coast *C«AR* pMRftUtiveml. 
[in] BOOL fAsync) ? 

HRESULT Ltoad< 

rin) BOOL fFullyAvaiiable, 
[in] TWonifcex *pi*nkHonte , 
(in! LPBC pibcii 
tin] DWORD grfMode) ; 

HBESULT Setlttput< 

[in] HJiDcnown *pStm) ; 

RESULT |MShBata( 

[in] const char* pData, 
[in] ULONG ulChars, 
(in] BOOL fLastBUtter) ; 

HRESCLT SetRootl 

[in] lUnknown* pUokRoon) ; 

hresULT PushFactoryC . 
^{inl INodeFactory* pNodcPactory) ; 

HR ^ut? C S^F^tory** ppNodeFactory) ; 

HRESULT Kun( 

[in] long IChars) ; 

hresult GetReadyStateU ? 

HRESULT Abort ( 

[in] BSTR bstrErrorlnf 0) ; 

HRESULT GetURLl 

[out] const WCHAR** ppwcBuf); 

// These are stiU needed to se= the secure base URL, Perhaps r««« 
^ SlfS S- asTdefault if Load^D or LoadEntity or S.tURL i. ca 
// with a NOLL base URL. 
HRESULT Setsecurema* ( 

I in] const «CHAH- p32Bas«U*l) : 

HRESULT GetsecureURL ( 

[out] const WCHAK** ppwcBuf); 

); 

The parser is multithread safe. In other words, it is safe to call these methods from any thread. 
SetURL 

This is one of four different methods for providing input to the parser. You pass a base uri, _and 
a relative url For example, the base could be "http^/www.rra W o S oflcom^te$t,htm and the 
^Zrtlm be «i»r. The asyoc flag ^^*£*£^£%^ 
synchronously or asynchronously. If it is async, you will get EXPENDING from Run ana you 
2S you pump your message queue. (URLMon requires that you pump the 
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message queue). 
Load 

This method corresponds to the Load method in IPersistMoniker. The parser will call 
BindToStorage to get an IStream and load the XML associated with the given moniker. You 
can also call GetURL to get the URL representation of the given Moniker. 

Setlnput 

Use this method if you already have an IStream containing XML. If the stream returns 
EXPENDING then the part er returns EXPENDING from Run(), The caller then must call Run 
again at a later time when more data is available. 

PushData 

The lowest level way to provide input is as a raw buffer of chars. The buffer is not NULL 
terminated. You call PushData with each buffer and set lastBuffer to TRUE when you push the 
last buffer. Run() will continue to return EXPENDING until you do this. It is also ok for a token 
to span multiple buffers* For example, the first buffer might end with "<!- foo" in which case 
the parser will return EXPENDING, then if the next buffer starts with "bar— >" then the parser 
will complete the token and call the NodeFaciory with the combined COMMENT with text 
foobar— >\ 

SetRoot 

This sets die top-most node which is used as the pTJnkParent in all top level CreateNode calls. 
The XML. Object Model, for example, could pass in the KMLDocument for this. 

PnshF&ctory 

The parser delegates to a given NodeFactory for each element it finds in the XML source. See 
NodeEacj;ory_I ^ erace for details. More than one node factory can also be pushed and they will 
be automatically popped by the parser based on the scope in which they were pushed. This 
could be used, for example, to provide different node factories to handle different namespaces 
within the document. 

Run 

Run parses the specified amount of XML (in characters), then it returns EXPENDING if it isn't 
finished or it returns S_OK when it reaches the end of the input. An amount of -1 means parse 
as much as it can get from the input stream. If you are using PushData, the Run method returns 
EXPENDING until you push the last buffer and then after that ic will return S_OK_ The number 
given to Run is just a hint ~ if there are not the specified number of characters left in the input 
buffer then Run will still succeed. 

If you used SetURL or Load to load a remote XML de,.Run(-l) will return EJPENDING, but 
then so long as your application has a message loop, the download and parsing will occur in the 
background as data becomes available from URLMON. Then you can check GetReadyState 
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• „r <ome ooint the parser will no longer return XMLPARSER.BUSY. If 

from time xo rm* and * ^J^^use the Load method and register your own 
you want to monitor the progress yo^" 1 _ 
IBindStatusCallback and watch the OnProgress calls. 

GetReadyState 

0.0^,^ KnlnK ™ Of U» following v-ues Wicadog to S «e of d« ^ 

• IDLE if the parser is in the Reset state. r 

• WAITING if the input stream returned E_J»ENDUMU , 

• BUSY if there is data available for parsing, 

• ERROR if the parser found an error, 

• STOPPED if the parser was aborted and 

• SUSPENDED if it is owtenly suspended. 

Abort, Suspend, Reset 

, Kv the c-uev or by the NodeFactory at anytime by calling Abort If 

The parser can be stopped by ttte cauer or oy •» T f ^ hi t contains more 

the NodeFactory calls abort, it can also ^™ jto wm m^ beretumed from 

information about why the node factory w aborting the parse. 1 us wui men 

GetErrorlnfo. 

The parser can also be suspended at any time and resumed by calling Run again. This helps 
clients tweak their performance by doing just-in-time parsing. 

may be in the current parser context- 

GetLtaeNnmber, GetLinePosition, GetLineBnffcr, GetAbsolutePosition, GetUstError, 
GetErrorlnfo, GetErrorPositioii 

anl gSm* returns a BSTR which contains more descriptive wh * 
went wrong. GctPosition returns the offset of the error in characters from the start of 
GetLineBuffer's line buffer. 

SetFlags,GetFlags 

The following flags can be ORd together to control how the parser works. By default none of 
these flags are set. 

xmlflag_nonamespaces . . 

Whether the namespace declarations and tag name syntax are recognized. 
XMLFLAG _NOWHTTESPACE 

Whether to return WHTTESPACE nodes. 
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XMLFLAG_C ASE1NS ENSITTVE 

This makes the parser case insensitive - but the text you get is still whatever was in 
the original file. It does NOT fold to upper case since this is inefficient and is not 
always needed, 
XM1FLAG_IE4C0MPATIBILITY 

Turn on all D64 compatbility flags, which is an OR of all of the above 
(XMLFLAG^CASEINSENSITIVE, XMLFLAG_NONAMESPACES > 
XMLPLAG^NOWHTTESPACE) plus: 

- allows PCDATA to contain unescaped ampersand characters 

- allow short end tags <f> 

- normalize whitespace in attribute values and PCDATA 

- allowing SGML comment syntax ('-') inside a comment 

- allow duplicate attributes 

- allow comments before the xml declaration, and so forth 

Note that enabling the TEA Compatability makes the parser run 1 1 % slower. 
XMLFLAG.PROCESSNAMESPACES 

Whether to provide namespace processing. See S tandard Namespace Processing . 

The stuff below this line isn't being implemented for IE5. However we're keeping 

the design here for the release after IE5- 

XMLELAG^PROCESSSCHEMAS 

Whether to provide schema processing. This flag causes the parser to 
automatically plug in the Standard Schema Proeces sigg 

XMLFLAG^ VALIDATE 

Tells the schema processor whether to report validation errors against any schema 
present in the document instance. Note that it is perfectly sensible to request 
validation without Schema infonnarion, Schema information without validatin, or 
both validation and schema information. 

XMLFLAGJIXPANDENTITIES 

If this flag is set, the XMLParser expands all entity references for the customer 
node factory. If this flag is off, the behavior is as it was for Beta 1 - the customer 
NodeFactory receives CreateNode calls on ENTTTYREF objects (i.e. for each 
"&foo;")- Setting this flag makes it easier to write the common case where the 
customer NodeFactory application need not be aware of what content came from 
entities as opposed to in-line text. Note when Entity expansion is off, the 
XMLParser will call the customer node factory with a stream of events 
representing the XML content that comes from entity deciarations. These events 
will be delivered immediately after all schemas are processed* but before the root 
node of the XML Document (note this is not when the entity declaration is 
encountered in the DTD or Schema » for obscure XML reasons the parse nature 
of the entity could be changed by later schema information, so all entities must be 
processed after all other schema information). This will give the customer node 
factory that is doing it's own entity processing a chance to construct and save entity 
information in it's own format, possibly to refer to when ENTTTYREF events are 
received later in the document. 

GetURL 

In the case where SetURLQ or LoadQ was called, this method returns the URL that the parser 
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is loading 

Encoding Support 



For all the different methods of inpu t, the ^^S ^T*" " 
SLod. .at, d» tap* « SSm T» do«nH start wKt. abyfc 
OxFEFF - depending - 7^ " TJi3S^S2T«l« *<>» WW use mmi . 
„»« ma* it is «sume4 to bo UTF-S. J^*^ of ,& tapll ,. For example, the follo»"g 

< ? xnd veraion^l-O" encoding s^fO is' ?> 

q * T;r - ? Hirftcdv For all other character sets, MLang 
The XML Parser recognizes UTF-8 ^^l^^ll^M is used to perform 

Example Parser Usage 



HRESULT hr; ^rcm mparser nCLL, CLSCTXJCN*R0C_SERVER. 



ehecknr (xP->Set!nput { input) ) 
eheckhr (xp->FuSnFactory ( f ) ) ; 
hr * xp->*u*i<-D * 

return nr; 

) 

interface INodeFactoty = lUntoown 

^ HRESULT NatifyStart( 

[in] IKodeSource -pKodeSource, 
[in} NodeFactoryEvent e Event:> ; 

HRESULT NotifyEnd< 

[in] NodcfactoryBvent ©Event! * 

HRESULT EndChildrent 

[in! INoaeSouree *pNodeSou*ce, 

[inl BOOL fErrrptyNode, 

tin) DWORD <5wType, 

[in] conat WCHAR* pwcText, 

[in] ULONG ulLen, 

[inl lUnknown* pUrtfcttode); 

HRESULT error ( 

[in] iNodeSource *pWode5ource, 
[in] HRESULT hyErrorCode) ; 

HEESULT CreaceNode ( 



6/22/98 
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[in] INodeSource *pNode Source, 
[in] lUntoiown* punKParent, 
[in] lunknown* pUnkOuter, 
[in] DWORD dwType, 
[in, out] DWORD' oVSubType, 

[in] const WCHAR* pwSZText, // tag name 

fin] ULONG ulText, // lengcfc 

[in] const wchaR* pwNsPrefix, // namespaca prefix 

[in] ULOKfi ulNaPref ix, // prefix length 

[in] NairteSpace «ptfS, // namespace object 

[in] XUnkncwn 'pSchema, // Schema object (future) 

(out] lUnknown** ppUnkChild) ; 



Tlie NodeFactory is responsible for creating nodes based on the information provided by the 
XML Parser This is essentially a callback interface so thai custom node factories can be 
provided that build different kinds of object hierarchies. Notice that the nodes retold are any 
Unknown object - which means this is not tied specifically to the XML Object Model m any 
way. The XML Parser assumes nothing about the returned Rlnknown objects which means the 
NodeFactory can also return NULL. 

NotifyStart, NotifyEnd 

Convenience methods where the NodeFactory can do some initialization and cleanup. This is 
particularly useful if the same NodeFactory is being used to load multiple documents, Takes a 
NodeFactoiyEvent parameter, which is one of: 



typedef mum 

* NFEJwcOfcENT = i, // document started or ended 

wTg_PTO- // esteern&a dtd ataxted or ended. 

SFS_JNTEKNAti SUBSET- // internal flUbflet DTD 9 tax ted gr «nd«d 



EndAttributes 

This method is called when all the attributes for a given element are complete. In other words 
the greater-than character (>) has been reached. This method is not called for terminal nodes 
but may be called for empty elements (e.g, <foo id="1237>). 



EndChildren 



This method is called when all the subelements of the given element are complete. In other 
words the matching end tag </FOO> has been reached. This is also called if the tag is an empty 
tag <FOO ... f> in which case the ffimpty argument is set to TRUE in case the NodeFactory 
needs to distinguish between this case and <FOOx/FOO>. This method is not called for 
terminal nodes. 



Error 



This method is called when the parser runs into an error in the XML document The parser will 
stop at this point and return the HRESULT error code to the caller. The NodeFactory can call 
back to the parser to get more information about the error. 
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CreateNode 

^-SI^I^Ld i,»o -ch node factor, cU so fl.at te nod. faaory can 
3? ££Z e^tapom* i^uon - lite to «mnt ta» ^ « » 

stop the parser* 

""^ffiJSSSU of .he n0 d« bring cnKd Tto 

2S^d&S» Fti" CnateNo* call, or te to rcot object pKwfed osng 
the parser SctRoot method. 

m ^Z'^°^ P« of „o "^J^fS^S " ^ 

controlling IUxiknown, otherwise pUnkOuter is NULL, to case ot 
SSS^eS^d ppUnkduld is the IUntoown of the created node 
SdQI on ppUnkChild will yidd the desired type - for sample. 
IDOMNode. 

DW °?SSSpe. See table of node types below in Cl^^deRgfer^ce. 
DWO ?S nSSe- See table of node types below in ^« R,^- 

^^^rilLe.oraPCDATAtext^ .^^^ 
pointer is the lifetime of the DCMLParser that is the driving application. If 
the text is not a tag name, then the lifetime is even sS ^~ v ^^ Mm 
duration of this CreateNode call. Note thatElement/At^ute/PI tag names 
obtain attribute values (of type ID, JMOT,nmor 
NOTATION), may have namespace prefixes. For t^Mameter, «» 
prefixes will be stripped off and passed in the pwcNsPrefi* .ntead. B* if a 
5^ a -Ibwfc^** this foment will get "bar", and pwcNsPrefix 
will get "foo". 

ULONG ulLen 

This the length of the text argument 

const WCHAR pwcNsPrefix , . ^ 

This is the namespace part of the tag name or attribute value (see 
explanation under pwctext). This string is also NUL terminated, and 
"atomized". 

ULONG UlLen 

This the length of the pwcNsPrefix argument 

IUnknown** ppUnkChild . 

The NodeFactory may or may not create a node object representing the 
above information and return it here. If a non-NULL pointer is returned then 
this may be used as the parent in subsequent calls to CreateNode. In other 
words, the XML Parser maintains the parse context for you and passes in the 
appropriate parent pointer based on what it finds in the XML. Notice also 
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that this is an lUnknown - so the NodeFactory need not be building the 

XML Object Model. 
NameSpace * pNS 

Points to an object of type namespace (see Standar d Namespac e 

Jfrccessjing below), used to communicate the full canoespace information 

about: the current name. Unless XMLFLAG_PRO CESSNAMES PACES is 

set, this parameter is always NULL 
lUnknown ♦pSchema 

For IE5 always NULL. In the future: 

For element and attribute types only, points to an IDOMNode object, which 
is an XML-Data style schema node representing the schema infoonarion for 
the node being built. See Standard Schema Processing below. Unless 
XMLFLAG^PROCESSSCHEMAS is set, this parameter is always NULL. 
HRESULT 

If anything other than S_OK is returned, then the parser stops and returns the 
same error from the parser Run() method. So you could return EXPENDING 
if your node factory itself is loading some other XML file and is still waiting 
for data. In fact, this is exactly what a DTDNodeFactory would do. 

So for example, given the following XML fragment: 

<iteia idsTOO" ms:price='20 w title="BAK &£oo;"> 
<value>The <juick brown fox</value* 

</item> 

The following sequence of calls will be made on the node factory (indented for readability only 
- does not imply recursion): 

This probably needs updating, -cfranks 

Great ©Node (parser, root, ELEMENT, "item" , 4, 0> Altera); 
er*at*Nod« (parser, item, ATTRIBUTE, "id", 2, 0, &id) ; 

Createtitode (parser, id, PCDATA., B FQO'\ 3 , &value) ; 
BndChildren (parser . id); 

Croat *Uod© {parser , item, ATTRIBUTE, "ms ; price " , a, 2, &price) ; 
cr«&t*lfed* (parser, price, ?CData, "20" , 2, ivalue) ; 
xndrhildrtnfparsar , price) ; 

CXTMteNode (parser, item, ATTRIBUTE, "title", 5. 0, AtitleJ ; 
CrMteNode (parser, title, PCDATA, "BAR 3, &valuHj ; 
Cr e At 4Nod« (parser, title* ENTITYREP , "Coo", 3, fcvalue) ,- 
EndCLildran (parser , title) 7 - 
EndAfctributoa ( parser , i torn) ; 

Csreatelfod* {parser, item, WHITESPACE, "Oxd Oxa 0x9", 3, lvalue); 
CreateBod* (parser, item, ELEMENT, "value", 5, 0, lvalue); 

CrMt«iJo4o (parser, value, TEXT, "The quick brown fox", 19, fctexc) ; 
EadAttribut*a (parser , value ) ; 
CreateNode (parser, item, WHITESFACE, "Oxd Oxa", 2. lvalue); 
EndCbildrwti (parser- value) 7 
EndAttribtitOfl (parser, item) ; 
BadGhildz-oa. ( parser , i t em) ; 
KadC)illdr*n( parser, root); 



So notice that attributes can also have children, just like elements. 
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Example NodeFactory 

I'm sure this needs updating, -cfranks 

A NodeFactory fot the CDF Viewer that scans for the LOGO ICON HREF would be 
implemented with the following pseudo-code: 

class l»i-dtco»«^e»»cto*y : public IHoWacto*y 

* ULONG level ; 

bool inLogoTag; 

BSTH attribute; 

BSTR href: 

bool found; 

int seats; 

IFinOJcoxtffodeFactory ( ) 

level = 0; inLogoTag = false; 

href '» NULL; found = false; attribute - NULL. 

} 

HRfiSULT or»»*«m»d«(lXMLPa« er * P» lUnknown' parent, 
DWORD dwTypa, DWORD dwSubType, 
const WCHAR* text. ULOKfl len. ULONG nslen. 
lunfcnown**pCbild, NaaeSpaee *pNS) 

if (l terminal) 

iftlevk « 2 « dwrype - xml^ELEMBTP a* 8 «c*pi<text, 'ELEMENT) 
{ 

inLogoTag — true; 
else if (inLogoTdff && d^Type XML^TTRIBDTE) 

{ : ;5ysFreeString (attribute) ; 

attribute - : :SysAllocstringLen(text, len) ? 

) 

) 

else 

1 if (inLogoTag &fc dwType « pceata) 

f if ( 3 trcmpi(attribute, * STYLE" ) "0 strcmpi ( text , -elEMEJJT) = 

* // M foW *hat i** 1 ** laoJtlaff *or but « don't 
// testttoate until all attribute* are complete eo 
// that we are sure to got the KM? attribute »o 
// matter *liat order the attribttteQ were opacified. 

foimd = true; 

else if (strcrapi (attribute, p HREF")"0) 
( 

: :SysFraeString(href ) ; 

href ■« : : SysAllocStringLent text , len); 

) 

} 

*pChild a NULL; //Wo tree creation M 
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return S_OK; 

} 

MRESULT Endcaildx*ft { IXMLPar ser * p, IUnknown* node) 
{ 

level—; 

} 

HRESULT BadJLttributeatlXMLParser* p. lUntoown* node) 
{ 

if (inX,ogoTag k6t found) 
{ 

p->AborC(ttttU-) ; // Wo'r« *om It 

) 

inLo^oTag = false; 

) 

// Other HHtfeh^* left out pf ottanple since tb*y or* not needed in thifl eae 

) 

What this does is scan through the following XML to extract the highlighted information: 

<?XML VERSIOKss-l-tr?* 

<CHANNEL HREFs* http://wvw.msnbc.com"> 

<SBXiF HR£F=*http://www«microso£ t .com/ie/ic40/msnbc/msnbc.cdf " /> 
<TITL£>The MSNBC Channel</TITLE> 

<LOGO HRSF="http : / / www .micros of n . com/ie / ie^O/insiibc/msnbc . gif * STYL>E= " IMAGE u / > 
<LOGO HREF«*httpt//ww.microflofe.eott/i«/iis40/mi5nbfi/MnJpc,ico H STYLE- H I COW" /> 

Tbzn it aborts the parse ~ so this will usually complete within the first buffer of input clearly a 
lot more optimal than loading the entire document. 

Standard Namespace Processing 

If XMLFLAG_PROCESSNAMESPACES is $et, then the parser does namespace processing 
prior to the user's node factory receiving events. All that this means is that the processor will 
recognize the <?xml:namespace ... > PI and remember it's contents in a NameSpace 
object. Note that all the strings in the namespace object are NUL terminated, and "atomized" - 
meaning that equal string values will have equal pointer values (i.e. there is only one copy of a 
string). The lifetime of a NameSpace object passed down in a CreateNode call is the 
lifetime of the CreateNode call. Customer Node Factories should not remember the 
CreateNode object. However, the strings pointed 10 by the member variables of the 
NameSpace object have the same lifetime as the IXMLParsen 

< 

const WCHAR * szNS; // the ns part of a namespace PI 
const WCHAR * szPrefix; // the prefix part of a namespace PI 
const WCHAR * saSrc; // the source pare of a namespace PI 

J 

Standard Schema Processing 
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NOT BEING IMPLEMENTED FOR IE5. 

If XMLFLAG PROCESSSCHEMAS is set. then the parser does schema processing prior to 
If^li^y^m event, The willthcn tandl. " 
XML-Data ***** referenced by namespace Pis, and DTDs referenced by ^PffJ™ « 
DOCTYPE (both internal and external subsets). Schema information is passed to the rammer 

parameter, which points to an IDOMNode object that is the XML- 
Data tylc representation of the schema for jhat node (see XM^StaSybie^ and 
^^wJlip^k^to). DTD schema are converted to XML-Dato schema i*™?™* 
internally. The following events wiU be accompanied by Schema information (by dwType). 



|~~ dwType 


pReserved parameter 


XML_EUEMENT 


IDOMNode * pointing to the <Eleroen£Type ...> declaration for the 
element. --=J 


XML_ENTITYREF 


lOnly if XMLFLAG3XPANDENTITIES is 0>. IDOMNode *■ pointing to 
<Enttty ~> declaration for the entity. Note for external parsed entities 
the contents of the external parsed entity will be available by following 
the definition parameter of the entity to the loaded document (if 
available) containing the external parsed entity. .._ — _ 


XML_PI 


(Subtype = XML_N AM ESP A CEDECL, U- i»m«pace PI only). IDOMNode * 
ooinrinjs to the root of the external schema (if found). I.e. <Schema ...>. 


XML_DOCTYPE 


IDOMNode * pointing to the root of an XML-Data style schema formed 
by merging the external and internal subsets, and convening both to an 
XML-Data sryle schema. (Note that parameter entities are of coarse 
expanded before this happens - their original definition cannot be 
recovered). -> 



With this flag on. customer node factories will receive no events concerning the internals ot 
schema information- I.e. if there is an internal DTD subset in the document, it will be 
internally processed by the Schema Node Factory, and the customer node factory will receive 
no events for these. This decision is made because we believe that customer node factories 
will by and large have such little interest in or knowledge of DTD syntax that it is best not to 
even burden them with filtering it out. 



CreateNode Reference 

This is the complete set of node types and when they occur. There are two main types of nodes, 
those mac can have children (containers) and those that cannot (terminal nodes). 

The second column in the tables below, labelled "Which node factories will output this tag?" 
says which factories will call CreateNode on that tag type. If a node factory doesn't show up in 
a particular column, that means it filters that node type. This really only applies to the schema 
node factory, which will filter out all namespace Pi's and Schema information. 

(The dwSubTypes in bold aren't in Chris Lovett's IDl file yet -- 1 think we need these). 
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Containers 



dwType 


dwSubType 


Which 

node 

factories 

output 
this tag? 


When occurs 


XML-ELEMENT 


0 


standard, 

namespace, 

schema 


At the start of a new 
element <foo,.. 


XML_ATTRJB"UTE 


0 


standard, 

namespace, 

schema 


At the start of anew 
attribute FOO='\ J' 




XML_SYSTEM 


Standard, 
namespace 


SYSTEM literal 
(applicable in <!ENTITY> 
or <!DOCTYPE> only) 




XML J»UBLIC 


standard, 
namespace 


PUBLIC literal (applicable 
in<!ENTTrY>or<;J 
DOCTYPE> only) 




XML_VERSION 


standard, 
namespace 


XML version string 
(applicable in <?xml> PI 
only) 




XML^ENCODING 


standard, 
namespace 


XML encoding string 
(applicable in <?xmL> PI 
only) 




XML_STANB ALONE 


standard, 
namespace 


XML standalone string 
(applicable in <?xrol> PI 
only) 




XMLJYS 


standard, 
namespace 


xmlmamespace ns part 
(applicable in <? 
xml:naraespace PI only) 




XML_SRC 


Ct&nrlnrrf 

namespace 


xrnknamespace src part 
(applicable in <? 
xmhnamespace PI only) 


> 


5CML_PREFDC 


standard, 
namespace 


xmhnamespace prefix part 
(applicable in <? 
xmhnamespace PI only) 




i£ML_XMLSPACE 


standard, L 
namespace, I 
schema [' 


The special global 
■cmhxmlspace attribute. 
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XMIOCMLLANG 


standard, 

namespace, 

schema 


*Y*Ko cnPfMal global 

xml:xmllang attribute. 


XML^PI 


0 


standard, 

namespace, 

schema 


Tbe start of a<?foo with 
text equal to PI name, 




XMLJCMLDECL 


standard, 
namespace 


This the special <?xml »- 
PL In this caie the parser 
parses the version, 
encoding and standalone 
attributes and passes tnese 
as special node types- See 
XML^VERSION, 
XML_ENCODING and 
XML^STANDALONE 
below. 




XML^NAMESPACEDECL 


standard, 
namespace 


This the special <? 
xmliivamespace... PI. In this 
case the parser parses the 
n&me, nret ana as annouiea 
and passes these as special 
node types. See 
XMLJtfSNAME, 
XML_HREF and 
XML^AS below. 


XMLJCKJCTYPE 


0 


standard, 
namespace 


This is the special <! 
DOCTYPE declaration. 
The text is the doc type 
name and this will be 
followed by an optional 
XML^PUBLIC or 
XML^SYSTEM node and 
an optional 

XML.DTDSUBSET. If 
there is a 'PUBLIC 
keyword you will get 
XML.PUBLIC with the 
text of the public id 
followed by 

XML_SYSTEM with the 
text of the system literal. 



Terminal Nodes 



. tt t ii. * — .. / i rut ^ ^* u*_ Ari£ 
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dwType 



dwSubType 



Which node 
factories will 
output this 
teg? 



When occurs 



XML TEXT 



XMLJPCDATA 



standard, 

namespace, 

& 

schema 



The text inside a 
node or an attribute. 



XML_CDATA 



standard, 

namespace, 

& 

schema 



The text inside <l 
[CDATA[„J]> 



XML_WHITESPACE 



standard, 

namespace, 

& 

schema 



White space 
between elements 



XML ID 



Standard, 

namespace, 

& 

schema 



an attribute value of | 
type ID 



XML IDREF 



Standard, 
namespace, 

schema 



an attribute value of I 
type IDREF 



XML^ENTCTY 



standard, 

namespace, 

& 

schema 



an attribute value of I 
type ENTITY 



XML.NMTOKEN 



Standard, 

namespace, 

Sl 

schema 



an attribute value of | 
rypeNMTOKEN 



XML^NOTATION 



standard, 

namespace, 

& 

schema 



an attribute value of | 
type NOTATION 



XMLJ^AME 



XML_STRING 



standard, 
namespace 



general name token 
for typed attribute 
values or DTD 
declarations 



standard. 



general quoted 
string literal in DTD 



r».T . 1 i"™ 
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1 


J 


u*uiic*p^ || declarations. | 


XML_COMMENT 




standard, 
namespace, 

& 

schema 


The text inside <!--' 
and '-->' 


XML_ENTITYREF 


XMLJ^AMEDENTTTYREF 


standard, 

namespace, 

& 

schema 
(with 

doQiexpnAdenticies 
flag 

on) 


A named entity 
node, &foo; 




XMl^OENERALEmTTYRBF 


Standard, 

namespace, 

& 

schema 
(with 

domeapandemiucs 
flag 

on) 


A numeric entity, 
&#23; 




XMLJ^JMENTrrYREF 


standard, 

namespace 

& 

schema 
(with 

dontexpandentiUK 

flag 

on 


A numeric entity, 
&#23; 




XMLJBEXENTTTYREF 


standard, 

namespace 

& 

schema 
(with 

doniexpandentitfw 

flag 
on 


A hexidicimal 
entity, ೷ 



Errors 



typedef enum 

( 

XML_E^AfcSEEKROKBA5E = 0x000 00000, // NEED AN OFFICIAL RANGE! 

XML_E JSNDOFINPUT = XML..B.PARSEERRDRHASE , 

XM^_£_UNCLQS£DFI , //I 

XMIl_E^MISSINGEQOALS f // 2 

XM£_E_tJNCLOSEDSTAJVrTAG , // 3 

XML_E_UNCLOSEDENDTAG , // 4 
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XML_E_UNCLOSEDSTRING , 

xml 5 mis5ingqu0te, 
xmlI£!comhen?syntax , 
xml^_s_unclossdc0mmsnt , 

XMLlEJADSTARTISrAWECHAR , 
XML_E_BADNAMECHAR , 
XML_E_UNCLOS£DDECL , 
XML_E_BADCHAB-INSTR ING , 

XML_E_BADCHAKDATA , 
XMLJB_UNCLOSEDMARKUPDECL ; 
XMLJELUNCLOSEDCDATA, 
XMLJBJMlSSlNGWttlTESPACB, 
XML JSLBADDECLNAME , 
XMI^E^HADEXTEHNAfclD, 
XMLJE.EXPEC? XNGTAGEND , 
XMLv.E.BADCHARINDTD , 

xmi^e_badelementindtd , 
xmljljbamharindecl, // 17 

xml_eljbadchahinentref . 

XMLJL.UNBALANCEDPAREN, 
XML_E_EXPEC'FINGOPENBRACKET , 
XMLLjSjaADENPCOJfftJSBCT , 
XML^E^RESERVEDNAMESPACE , 
XML^E^INTERNALERROB , 

XML_£_£XPECTING_ENCODZNG , 

XKL_E W UNEXPECTED^ATTRIBUTE_IK_NAKE£PACEDECL , // 21 

XMI^EJBXP£CTINC_J»ME, 
XMr^E_NAMESPACEDECLSYNTAX P 
XMt^t3NEXPECTECLWHlTESPACJE, // 24 
XMI^J^UNEXPECTED^ATTRIBUTB, // 25 
XM^_SUSPENDED, // 26 

XML_E_STQPFED, // 2? 

XKL„E_UNEXPECTEDENDTAG, // 23 

XML_E_ENDTAGMISMATCH, // 2S 

XML^^.XJWCLOSEDl'AG , /> 2A 

xhi^_e_duplicateattribute , // 2B 

XML^EUMULTIPLBROOTS, // 2C 

XML^E_INVALlDATROOTLEVEL , // 2D 

XML_E_ BAOXMLDECL, // 2E 

XMLl.B_INVALIDENCODING, // 2P 

XML_E_INVALIDSWITCH. // 30 

XMI^e_JlTSSINGROOT, // 31 

XML_E_INCOMPLETE ..ENCODING, // 32 

XMLJ^EXFECTINGJIDATA, // 33 

XML^E_IOTALIDJ10DEL, // 34 

XML_EJADCHARINM1XEDM0DEL # // 35 

XML_E_MISSIN(5_STAR, // 36 

XKL^EJADCKARINMODEL, // 37 

XML^MISSIWG.PAREN. // 38 

XML_E_INVALID_TYPE , // 39 

XMLJELIWALIDXMLSPACE, // 3 A 

XML_E_WLTI^ATTR_VALUE , // 33 
XML_E_ INVAL rD — PRES ENC S , // 3C 

XML^EJADCHARINENtWERATION, // 3D 

XML_5_UNEXPECTEDE0p f // 3E 

XML_5_BADPEREPINSUBSET , // 3F 

XML — E_BADXMLCAS E , // 40 

XML_ELCONDSECTINSUBSET, // 41 

XML_E_CDATAXNVALIO , // 42 

XML^ELIWALID^STANDALONE, // 43 

XML^E_P£LNESTIN<5, // 44 



// 


5 


// 


6 


// 


7 


// 


8 


// 


3 


// 


A 


// 


B 


// 


C 


// 


D 


// 


E 


// 


F 


// 


10 


/J 


11 


// 


12 


// 


13 


// 


14 


// 


15 


/J 


16 


// 


18 


// 


19 


// 


1A 


// 


IB 


// 


1C 


// 


ID 


// 


IE 


// 


IF 


// 


20 


.IN_NAWEJ 


// 


22 


// 


23 
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XML E_UNSXPECTED^STANPALONE p // <*5 
XML IDECL SYNTAX r // « 

' xmlIz_bxpectingclosequote, // 4C 

XML E DTDEIiEMENT_ OUTS 1EE_DTD p 4A 
XML^CdUPLICATEDOCTYPE , // 4B 

XML ^JHSSTOG_ENTITY, // 4C 

XMlIe_ENTITYHEF_JNNAME , // 4D 

XML_B_LASTERRO^ , 
} XMLErrorCode ; 

// Possible ready states 
typedef enum 

< 

XMLPARSEILJtDLE, 
XMLFARSERJWAITING , 
XMLPARSEEL-BUSY, 
XMLFAttS ER_£BROR , 
XMLPARSER^S TOPPED, 
XMLPARSER_ST3SPENDED 
> XMLReadyState; 

// Some parser flags which ean be OR'd together, 
// By default all flags are clear. 

typndnf wua 

^ 501LrUvC_CAE EIH£E»S ITtVE = 1. 

XMLFi*CLJJW&HBSP*eEs = 2, 

j^^~v£lid*TXOH * IS, // whetter cq load wd prcrce»ff dt»* 

XKLfuOROCES SNAKES PACES = 32 - 

^KLFtAq.^XPANPSWTTTJES = 129. 

Current Status 

As of early June 1998, we're starting Beta 2 work. 

Open Issues 



1. 



Change History 



1998.06.18 


Rev 1.21: Update for all the Beta 2 schedule cuts. No support for 
XMLFLAGS _PROCESSSCHEMA t XMLFLAGS JVAUDATE, or 
XMLFLAGS^EXPANDENTTTIES . Remove stuff about atomizing tag names. 


1998.06.1 


Rev 1. 15: Stop identifying the Namespace and Schema Nodefactories as 
distinct node factories. They are now just standard parser features* Eliminate 
pReserved parameter in favor of pNS and pSchema. Fix up flag definitions. 


1998.06.09 


Extensive editing in response to CLovett feedback. Nul terminated and 
atomized CreateNode strings for Java compatibility. 




PM (CFranks) finally takes over this document. First round of documenting 
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1998.06.06 


X}vi«> a CDaUgCS, siaJluara naiiii&>pace auu. s^iidiio- iiwuc j.£u->ilji i^j, «-i * J tr ****** 

dwSubType on CreateNode, automatic entity expansion feature of schema node 
factory. 


3/19/98 


Fixed buffer overrun bug in PusbData usage, and added xmhspace and xmlilang 
implementation. So for these attributes you now get the XMLJXMLSPACE 
stud XML^XMLLANG node type instead of XML_ATTRIBIJTE. . 


3/15/98 


Added loadDTD, and added new parser flags, . 




Added StartDTDSubset and EndDTDSubset and added implementation for 
GetErrorlnfo. > 


3/4/98 


Added StaitDTD and EndDTD and added rype result field to CreateNode. Also 
aaaeo ueiuiu#» 


3/1/98 


naacu Loaa uicuioa iqt jurcjrMSuvioiij4t.cr clients* ana aoaea udgs w cuuu oi. uuw 
parser works and added unique attribute checking to parser implementation. 


2/28/98 


Added SetURL(base,url) for secure XML download support. 


2/27/98 


Fixed memory leaks and added new void* reserved pointer to CreateNode. 


2/27/98 


Changed Reset to also reset NodeFactoiy and Root node* [ 


2/26/98 


Added new args to EndChildren and EndAnributes. | 


2/25/98 


Implemented encoding support. Provided first implementation for testing with- 1 
Added GetAb&olutePosition. Changed Get;La$tEiror to return the last 
HRESULT and added GetErrorlnfo. Added more clarifications based on 
feedback so far. j 


2/19/98 


Changed error handling to use lErrorlnfo and changed GetLastBiror to just 
return HRESULT and added GetErrorlnfo to return the lErrorlnfo. Also 
changed Run to take a long number of characters instead of killobytes. Added 
Suspend/Resume and removed SetURL. Changed name of interface from 
IParser to KMLParser, Removed Resume since this is redundant with Run* 
Added Reset. 
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